A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes
Identifieur interne : 002A10 ( Main/Exploration ); précédent : 002A09; suivant : 002A11A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes
Auteurs : Stefan Kurtz [Allemagne] ; Apurva Narechania [États-Unis] ; Joshua C. Stein [États-Unis] ; Doreen Ware [États-Unis]Source :
- BMC Genomics [ 1471-2164 ] ; 2008.
Descripteurs français
- KwdFr :
- MESH :
English descriptors
- KwdEn :
- MESH :
- chemical : DNA Transposable Elements.
- methods : Computational Biology, Genomics.
- Genome, Plant, Methods, Oryza, Software, Sorghum, Zea mays.
Abstract
The challenges of accurate gene prediction and enumeration are further aggravated in large genomes that contain highly repetitive transposable elements (TEs). Yet TEs play a substantial role in genome evolution and are themselves an important subject of study. Repeat annotation, based on counting occurrences of
Here we introduce the Tallymer software, a flexible and memory-efficient collection of programs for
The Tallymer software was effective in a variety of applications to aid genome annotation in maize, despite limitations imposed by the relatively low coverage of sequence available. For more information on the software, see
Url:
DOI: 10.1186/1471-2164-9-517
PubMed: 18976482
PubMed Central: 2613927
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 000551
- to stream Pmc, to step Curation: 000551
- to stream Pmc, to step Checkpoint: 001405
- to stream PubMed, to step Corpus: 002060
- to stream PubMed, to step Curation: 002060
- to stream PubMed, to step Checkpoint: 002018
- to stream Ncbi, to step Merge: 000650
- to stream Ncbi, to step Curation: 000650
- to stream Ncbi, to step Checkpoint: 000650
- to stream Main, to step Merge: 002A36
- to stream Main, to step Curation: 002A10
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes</title>
<author><name sortKey="Kurtz, Stefan" sort="Kurtz, Stefan" uniqKey="Kurtz S" first="Stefan" last="Kurtz">Stefan Kurtz</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Center for Bioinformatics, University of Hamburg, Bundesstraße 43, 20146 Hamburg, Germany</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Center for Bioinformatics, University of Hamburg, Bundesstraße 43, 20146 Hamburg</wicri:regionArea>
<wicri:noRegion>20146 Hamburg</wicri:noRegion>
<placeName><settlement type="city">Hambourg</settlement>
<region type="land" nuts="2">Hambourg</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Narechania, Apurva" sort="Narechania, Apurva" uniqKey="Narechania A" first="Apurva" last="Narechania">Apurva Narechania</name>
<affiliation wicri:level="2"><nlm:aff id="I2">Cold Spring Harbor Lab, 1 Bungtown Rd, Williams #5, Cold Spring Harbor, NY 11724, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cold Spring Harbor Lab, 1 Bungtown Rd, Williams #5, Cold Spring Harbor, NY 11724</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="I3">Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Stein, Joshua C" sort="Stein, Joshua C" uniqKey="Stein J" first="Joshua C" last="Stein">Joshua C. Stein</name>
<affiliation wicri:level="2"><nlm:aff id="I2">Cold Spring Harbor Lab, 1 Bungtown Rd, Williams #5, Cold Spring Harbor, NY 11724, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cold Spring Harbor Lab, 1 Bungtown Rd, Williams #5, Cold Spring Harbor, NY 11724</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Ware, Doreen" sort="Ware, Doreen" uniqKey="Ware D" first="Doreen" last="Ware">Doreen Ware</name>
<affiliation wicri:level="2"><nlm:aff id="I2">Cold Spring Harbor Lab, 1 Bungtown Rd, Williams #5, Cold Spring Harbor, NY 11724, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cold Spring Harbor Lab, 1 Bungtown Rd, Williams #5, Cold Spring Harbor, NY 11724</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">18976482</idno>
<idno type="pmc">2613927</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2613927</idno>
<idno type="RBID">PMC:2613927</idno>
<idno type="doi">10.1186/1471-2164-9-517</idno>
<date when="2008">2008</date>
<idno type="wicri:Area/Pmc/Corpus">000551</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000551</idno>
<idno type="wicri:Area/Pmc/Curation">000551</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000551</idno>
<idno type="wicri:Area/Pmc/Checkpoint">001405</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">001405</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:18976482</idno>
<idno type="wicri:Area/PubMed/Corpus">002060</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002060</idno>
<idno type="wicri:Area/PubMed/Curation">002060</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002060</idno>
<idno type="wicri:Area/PubMed/Checkpoint">002018</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">002018</idno>
<idno type="wicri:Area/Ncbi/Merge">000650</idno>
<idno type="wicri:Area/Ncbi/Curation">000650</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000650</idno>
<idno type="wicri:Area/Main/Merge">002A36</idno>
<idno type="wicri:Area/Main/Curation">002A10</idno>
<idno type="wicri:Area/Main/Exploration">002A10</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes</title>
<author><name sortKey="Kurtz, Stefan" sort="Kurtz, Stefan" uniqKey="Kurtz S" first="Stefan" last="Kurtz">Stefan Kurtz</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Center for Bioinformatics, University of Hamburg, Bundesstraße 43, 20146 Hamburg, Germany</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Center for Bioinformatics, University of Hamburg, Bundesstraße 43, 20146 Hamburg</wicri:regionArea>
<wicri:noRegion>20146 Hamburg</wicri:noRegion>
<placeName><settlement type="city">Hambourg</settlement>
<region type="land" nuts="2">Hambourg</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Narechania, Apurva" sort="Narechania, Apurva" uniqKey="Narechania A" first="Apurva" last="Narechania">Apurva Narechania</name>
<affiliation wicri:level="2"><nlm:aff id="I2">Cold Spring Harbor Lab, 1 Bungtown Rd, Williams #5, Cold Spring Harbor, NY 11724, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cold Spring Harbor Lab, 1 Bungtown Rd, Williams #5, Cold Spring Harbor, NY 11724</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="I3">Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Stein, Joshua C" sort="Stein, Joshua C" uniqKey="Stein J" first="Joshua C" last="Stein">Joshua C. Stein</name>
<affiliation wicri:level="2"><nlm:aff id="I2">Cold Spring Harbor Lab, 1 Bungtown Rd, Williams #5, Cold Spring Harbor, NY 11724, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cold Spring Harbor Lab, 1 Bungtown Rd, Williams #5, Cold Spring Harbor, NY 11724</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Ware, Doreen" sort="Ware, Doreen" uniqKey="Ware D" first="Doreen" last="Ware">Doreen Ware</name>
<affiliation wicri:level="2"><nlm:aff id="I2">Cold Spring Harbor Lab, 1 Bungtown Rd, Williams #5, Cold Spring Harbor, NY 11724, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cold Spring Harbor Lab, 1 Bungtown Rd, Williams #5, Cold Spring Harbor, NY 11724</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint><date when="2008">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Computational Biology (methods)</term>
<term>DNA Transposable Elements</term>
<term>Genome, Plant</term>
<term>Genomics (methods)</term>
<term>Methods</term>
<term>Oryza</term>
<term>Software</term>
<term>Sorghum</term>
<term>Zea mays</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Biologie informatique ()</term>
<term>Génome végétal</term>
<term>Génomique ()</term>
<term>Logiciel</term>
<term>Méthodes</term>
<term>Oryza</term>
<term>Sorghum</term>
<term>Zea mays</term>
<term>Éléments transposables d'ADN</term>
</keywords>
<keywords scheme="MESH" type="chemical" xml:lang="en"><term>DNA Transposable Elements</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Computational Biology</term>
<term>Genomics</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Genome, Plant</term>
<term>Methods</term>
<term>Oryza</term>
<term>Software</term>
<term>Sorghum</term>
<term>Zea mays</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Biologie informatique</term>
<term>Génome végétal</term>
<term>Génomique</term>
<term>Logiciel</term>
<term>Méthodes</term>
<term>Oryza</term>
<term>Sorghum</term>
<term>Zea mays</term>
<term>Éléments transposables d'ADN</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>The challenges of accurate gene prediction and enumeration are further aggravated in large genomes that contain highly repetitive transposable elements (TEs). Yet TEs play a substantial role in genome evolution and are themselves an important subject of study. Repeat annotation, based on counting occurrences of <italic>k</italic>
-mers, has been previously used to distinguish TEs from low-copy genic regions; but currently available software solutions are impractical due to high memory requirements or specialization for specific user-tasks.</p>
</sec>
<sec><title>Results</title>
<p>Here we introduce the Tallymer software, a flexible and memory-efficient collection of programs for <italic>k</italic>
-mer counting and indexing of large sequence sets. Unlike previous methods, Tallymer is based on enhanced suffix arrays. This gives a much larger flexibility concerning the choice of the <italic>k</italic>
-mer size. Tallymer can process large data sizes of several billion bases. We used it in a variety of applications to study the genomes of maize and other plant species. In particular, Tallymer was used to index a set of whole genome shotgun sequences from maize (B73) (total size 10<sup>9 </sup>
bp.). We analyzed <italic>k</italic>
-mer frequencies for a wide range of <italic>k</italic>
. At this low genome coverage (≈ 0.45×) highly repetitive 20-mers constituted 44% of the genome but represented only 1% of all possible <italic>k</italic>
-mers. Similar low-complexity was seen in the repeat fractions of sorghum and rice. When applying our method to other maize data sets, High-<italic>C</italic>
<sub>0</sub>
<italic>t </italic>
derived sequences showed the greatest enrichment for low-copy sequences. Among annotated TEs, the most highly repetitive were of the Ty3/gypsy class of retrotransposons, followed by the Ty1/copia class, and DNA transposons. Among expressed sequence tags (EST), a notable fraction contained high-copy <italic>k</italic>
-mers, suggesting that transposons are still active in maize. Retrotransposons in Mo17 and McC cultivars were readily detected using the B73 20-mer frequency index, indicating their conservation despite extensive rearrangement across cultivars. Among one hundred annotated bacterial artificial chromosomes (BACs), <italic>k</italic>
-mer frequency could be used to detect transposon-encoded genes with 92% sensitivity, compared to 96% using alignment-based repeat masking, while both methods showed 92% specificity.</p>
</sec>
<sec><title>Conclusion</title>
<p>The Tallymer software was effective in a variety of applications to aid genome annotation in maize, despite limitations imposed by the relatively low coverage of sequence available. For more information on the software, see <ext-link ext-link-type="uri" xlink:href="http://www.zbh.uni-hamburg.de/Tallymer"></ext-link>
.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations><list><country><li>Allemagne</li>
<li>États-Unis</li>
</country>
<region><li>Hambourg</li>
<li>État de New York</li>
</region>
<settlement><li>Hambourg</li>
</settlement>
</list>
<tree><country name="Allemagne"><region name="Hambourg"><name sortKey="Kurtz, Stefan" sort="Kurtz, Stefan" uniqKey="Kurtz S" first="Stefan" last="Kurtz">Stefan Kurtz</name>
</region>
</country>
<country name="États-Unis"><region name="État de New York"><name sortKey="Narechania, Apurva" sort="Narechania, Apurva" uniqKey="Narechania A" first="Apurva" last="Narechania">Apurva Narechania</name>
</region>
<name sortKey="Narechania, Apurva" sort="Narechania, Apurva" uniqKey="Narechania A" first="Apurva" last="Narechania">Apurva Narechania</name>
<name sortKey="Stein, Joshua C" sort="Stein, Joshua C" uniqKey="Stein J" first="Joshua C" last="Stein">Joshua C. Stein</name>
<name sortKey="Ware, Doreen" sort="Ware, Doreen" uniqKey="Ware D" first="Doreen" last="Ware">Doreen Ware</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002A10 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002A10 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= PMC:2613927 |texte= A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:18976482" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |